Maybe you’d like a colleague to explore your findings. Or maybe
you’re a teacher with an exercise you’d like your students to review and
replicate. In the past, if you wanted to someone to use the same IPUMS
data that you did, you would need to provide a list of samples and
variables and instructions for your collaborator on how to navigate the
online data extract system.
In this post, I’ll first introduce the ipumsr functions for saving an
extract definition to a .json file and loading a saved definition from
.json. Then, I’ll demonstrate two use-cases for those functions: sharing
an analysis in an R Markdown
document, and sharing an interactive application created with Shiny. Note that the code examples
here will only work once you’ve requested beta access to the IPUMS
microdata API by emailing ipums+api@umn.edu and set up
your API key.
Sharing an analysis in R Markdown
R Markdown
is a plain-text file format that allows you to combine prose, code, and
analysis output into one document. To help users share an analysis of
IPUMS data in an R Markdown document, we’ve created a new R Markdown
template, the “.Rmd for Reproducible Research” (RRR). You can download
the template as a standalone file here, or you can install the
development version of ipumsr (by following the instructions here)
and access the template through the RStudio menu interface as shown
below.
The beauty of the RRR is that it allows your collaborators to run
your analysis out-of-the-box, without taking any separate steps to
download the data. How does it accomplish this? Let’s take a look.
The first step in using the RRR workflow is to create a data extract.
While it is possible to create extracts entirely within R (more
on that here), many users (this author included) may want to use the
online IPUMS extract system to create and submit their extracts. Once
you’ve submitted your extract, take note of the extract number, then
begin working with the RRR as follows.
In RStudio, select File > New File > R Markdown:
In the the popup menu, select From Template in the left sidebar, then
Rmd for Reproducible Research from the list of templates, and click
OK:
Now here we are, looking at a wall of instructions:
But don’t worry! We’ve tried to make this as painless as possible. In
just a few steps you’ll have your IPUMS data downloaded and the
framework for a shareable analysis project. First, scroll down to the
first code chunk, labeled “project-parameters”, and fill in values for
the four parameters defined there, as shown below: the IPUMS collection
and extract number of your submitted extract, a descriptive name for
your extract, and a subfolder in which to save your data files.
In fact, you can leave all the default values of these parameters if
you want to analyze your most recent IPUMS USA extract, though I’d
recommend filling in a better descriptive_name for the
extract even in that case.
After filling in values, save the file, then click the RStudio “Knit”
button, and awaaaaaaay it goes! All that’s left to do is sit back,
relax, and –
…wait, is that an error???
In fact, this template can run out of the box IPUMS
USA and your most recent extract. Since this
just so happens to be the extract I’d like to work with, I can
proceed without making any edits, simply by clicking
Knit or running rmarkdown::render().
Instead, they use the .json extract definition to create
and submit a new data extract the first time the script
is run. By sharing as few as two files, you can allow a colleague or
student to download the exact same IPUMS data you used in order to
replicate or further explore your work. We hope this helps make research
more accessible and replicable. Read on to see the RRR in action, as we
explore some data data from the Puerto Rican
Community Survey, available from IPUMS USA.
The basic assumptions of the template are that you:
- Have registered with IPUMS USA (or IPUMS CPS)
- Have generated an IPUMS
API key
- Have added
that key to your .Renviron
- Have a specific dataset you want to download, analyze, and
visualize
- Would like to let other (IPUMS users) replicate the work
For this example, we’re using IPUMS USA data, specifically looking at
the Puerto Rican Community Survey from the years 2015-2019.
To get started in R, make sure to update ipumsr, then
select our new RRR template.


Now here we are, looking at a (possibly) overwhelming amount of code.
But don’t worry! We’ve tried to make this as painless as possible. In
fact, this template can run out of the box, defaulting to IPUMS
USA and your most recent extract. Since this
just so happens to be the extract I’d like to work with, I can
proceed without making any edits, simply by clicking
Knit or running rmarkdown::render().

And awaaaaaaay it goes! All that’s left to do is sit back, relax, and
-
…wait, is that an error???

No! See, as both the ‘error message’ and our friend Gimli
indicate - that’s not an error! The RRR is set up to be
run/Knit a few times, at your leisure. The reason for this is to ensure
that the IPUMS servers have time to process your data requests. But
look, something did happen - we’ve added a subfolder
named Data. And within that are two new files: a
.json extract definition and a chk_....csv file.
The first file contains all the information needed to get your data (or
to share with friends/loved ones) and the second file, you don’t need to
worry about!



Now, you may have noticed that these files are both called
“template,” and you might be wondering why. This is one of the default
parameters of the RRR. Users will want to edit this, which
can easily be done in the first code-chunk of the RRR,
depending on your window/font size, you may need to scroll. Or you can
use the Table of Contents to jump down to
Setup-Project Parameters. We’ll set this to a more
descriptive names since our main focus will be migration rates in Puerto
Rico.


Since we’ve changed the descriptive_name parameter, it’s
helpful to delete the .json and chk_.csv files
with the old name, “template”, before proceeding (if you set a proper
descriptive name in the first place, you would not need to delete
anything). With the name updated, awaaaay we knit!
And with just 2 clicks, we’ve pulled our most recent IPUMS USA data
DIRECTLY into our Rproj! (I really can’t overstate how cool this feature
is). You’ll notice there’s some basic descriptive information included
by default. Feel free to replace these as you develop your analyses.


From here, we can fill out the remainder of the RRR with
whatever analysis we’d like such as plotting migration rates over time.
To check out the full features of the RRR, be sure to check
out github.com/ipums/simple-api-shiny-app.
Clone the repo to try out the interactive tabset .HTML report for
yourself. Or check out the pre-rendered
version, though many features such as code-folding are not available
in this version.




This template is very much in beta, so be sure to share your
feedback by emailing us at ipums+cran@umn.edu or creating an issue on
GitHub. As an even-more-beta-bonus, we’ve included a simple Shiny
App: the Variable
Variation Value Viewer (VVVV), which uses these functions in a
similar way to create a self-compiling web-app.